Labate and Robertson BMC Plant Biology 2012, 12:133 
http://www.bionnedcentral.conn/1 471-2229/1 2/1 33 



Plant Biology 



RESEARCH ARTICLE Open Access 



Evidence of cryptic introgression in tomato 
{Solarium lycopersicum L.) based on wild tomato 
species alleles 

Joanne A Labate" and Larry D Robertson 



Abstract 

Background: Many highly beneficial traits (e.g. disease or abiotic stress resistance) have been transferred into crops 
through crosses with their wild relatives. The 13 recognized species of tonnato {Solarium section Lycopersicon) are 
closely related to each other and wild species genes have been extensively used for improvennent of the crop, 
Solarium lycopersicum L. In addition, the lack of geographical barriers has pernnitted natural hybridization between 5. 
lycopersicum and its closest wild relative Solarium pimpinellifolium in Ecuador, Peru and northern Chile. In order to 
better understand patterns of 5. lycopersicum diversity, we sequenced 47 markers ranging in length fronn 130 to 
1200 bp (total of 24 kb) in genotypes of 5. lycopersicum and wild tomato species 5. pimpinellifolium, Solarium 
arcanum, Solanum peruvianum, Solanum pennellii and Solanum habrochaites. Between six and twelve genotypes 
were comparatively analyzed per marker. Several of the markers had previously been hypothesized as carrying wild 
species alleles within 5. lycopersicum, i.e., cryptic introgressions. 

Results: Each marker was mapped with high confidence (e<l x 10"^°) to a single genomic location using BLASTN 
against tomato whole genome shotgun chromosomes (SL2.40) database. Neighbor-joining trees showed high 
mean bootstrap support (86.8 ± 2.34%) for distinguishing red-fruited from green-fruited taxa for 38 of the markers. 
Hybridization and parsimony splits networks, genomic map positions of markers relative to documented 
introgressions, and historical origins of accessions were used to interpret evolutionary patterns at nine markers with 
putatively introgressed alleles. 

Conclusion: Of the 47 genetic markers surveyed in this study, four were involved in linkage drag on chromosome 
9 during introgression breeding, while alleles at five markers apparently originated from natural hybridization with 
5. pimpinellifolium and were associated with primitive genotypes of 5. lycopersicum. The positive identification of 
introgressed genes within crop species such as S. lycopersicum will help inform conservation and utilization of crop 
germplasm diversity, for example, facilitating the purging of undesirable linkage drag or the exploitation of novel, 
favorable alleles. 
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Background 

Introgression is the transfer of genes of one species into 
the gene pool of another via hybridization. As a 
phenomenon, it has been an important topic in animal 
and plant genetics research for many different reasons. 
For example, introgression has been implicated in the 
adaptation of modern humans [1] and is of concern to 
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conservation biologists due to loss of integrity of wild bird 
and mammal populations [2,3]. In plants, introgression is 
a key concept in studies of the risks of contamination of 
natural populations by genetically modified (GM) crops. 
More commonly in crops, favorable genes from wild rela- 
tives are intentionally transferred into breeding lines for 
cultivar development. This has been particularly valuable 
in crop species that are relatively low in genetic diversity. 

According to a review of crop introgression breeding 
[4] the major functional categories of beneficial traits 
transferred from wild species are resistance or tolerance 
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to abiotic stress or disease, yield, cytoplasmic male steril- 
ity or fertility restorers for hybrid production, and qual- 
ity traits. Among the pioneering uses of wild crop 
relatives during the late 19^^ to early 20^^ centuries were 
the transfer of disease resistances into grape {Vitis vini- 
fera) [5] and sugarcane {Saccarum officinarum) [6]. A 
1986 review of 23 crops estimated that 6% of total an- 
nual economic value in the US was contributed by crop 
wild relatives [7]. For 13 major crops of global import- 
ance, it was estimated that 46 wild species have been 
used in released cultivars, and that furthermore, the 
introgression breeding approach is increasing [4]. Lack 
of information on pedigrees, unpublished activities 
within the private sector and changes in taxonomy are 
some of the factors that contribute to the uncertainty of 
the collective impacts of crop introgression breeding [4] . 

Some of the earliest tomato introgression breeding in 
the US may have been done indirectly and unwittingly 
via the French variety Merville des Marches. Recent 
phenotypic data collected for Merville des Marches PI 
109834 showed it to be variable in fruit size and smooth- 
ness (http://www.ars-grin.gov/cgi-bin/npgs/acc/display. 
pi? 1129442); its genotype was segregating, showed popu- 
lation admixture, and was an outlier based on genetic 
distance relative to many other S, lycopersicum acces- 
sions [8,9]. We postulated that these were indications of 
S, pimpinellifolium in its ancestry (this idea was exam- 
ined in the current study). The Fusarium wilt-resistant 
processing variety Marvel [10] was selected from Mer- 
ville des Marches in the early 1900s, and Marvel was a 
parent of Marglobe released in 1925 [11], which in turn 
can be found in the pedigree of many important varieties 
from the 1930s through the late 1950s (H.M. Munger s 
tomato pedigree chart provided by E.D. Cobb, Cornell 
University, 2012). Direct introgression of tomato with 
wild species in the US commenced in the 1930s concur- 
rent with collection expeditions to geographic centers of 
origin. The first released cultivar, developed from Mar- 
globe X S, pimpinellifolium, was aptly named Pan Ameri- 
can [12]. Introgression breeding efforts of tomato 
increased globally post World War II, involving the 
screening of a wide range of traits and all wild tomato 
species [13]. Such efforts continue to be of utmost prior- 
ity today using sophisticated tools such as introgression 
libraries for gene discovery [14,15]. 

Compellingly, of 96 introgressed traits tallied in 
released crop cultivars for 11 species (cassava, wheat, 
millet, rice, maize, sunflower, lettuce, banana, potato, 
groundnut, tomato), 55 of them were in tomato {Sola- 
rium lycopersicum L.); the next highest numbers were 
found in rice and potato with 12 traits each [4]. The em- 
phasis and success of introgression breeding in tomato 
encompasses several factors including its intrinsically 
narrow genetic base, relative ease of crossing with 



several wild taxa, production demands based on growing 
conditions and market niche, its susceptibility to pests 
and pathogens, and its sensitivity to abiotic factors. In 
addition to resistance or tolerance to dozens of bacterial, 
viral, fungal, insect, and nematode pathogens, hundreds 
of favorable genes or quantitative trait loci (QTL) for 
abiotic stress resistance, flower and fruit traits, yield, and 
plant architecture have been mapped in wild tomato 
species [16] and thus hold the potential to be exploited. 

Introgression breeding carries a cost, namely, genetic 
linkage of non-targeted loci that are eliminated through 
repeated backcrossing. Linkage drag can persist within a 
genome despite backcrossing, especially if recombination 
is suppressed. Several examples of linkage drag in to- 
mato and other crops have been quantified using mo- 
lecular markers [17-21]. Linkage drag can denote 
favorable, deleterious or neutral alleles that become in- 
advertently incorporated into breeding lines or cultivars. 

In this study we apply the term cryptic introgression' 
[22] to describe latent genetic variation in S, lycopersi- 
cum that originated from wild tomato species. Various 
scenarios can be evoked for its origins ranging from 
linkage drag, hybridization between feral S, lycopersicum 
and wild relatives, to crossing in open-pollinated popula- 
tions by wind or insect vectors with pollen of intro- 
gressed cultivars [23]. Cryptic introgression is of interest 
in germplasm collections such as those conserved at 
United States Department of Agriculture, Agricultural 
Research Service (USDA, ARS) Plant Genetic Resources 
Unit (PGRU) because it can indicate novel genetic vari- 
ation for exploitation by end-users, or conversely, reveal 
unfavorable and hence undesirable alleles with respect 
to crop improvement. 

In previous reports we hypothesized the detection of 
cryptic introgression in 5% to 10% of DNA markers that 
were resequenced in tomato germplasm panels [9,24,25]. 
The aim of the current study was to gather additional 
evidence on these alleles by resequencing and analyzing 
the same markers in several accessions of wild tomato 
species and one accession of weedy S, lycopersicum 
(Table 1). Although variation within wild species gene 
pools made it impracticable to attempt to discover the 
100% identical homologous allele, the assumption was 
that introgressed alleles would be more closely related to 
the alleles of a particular wild species than to their S, 
lycopersicum homologs. To identify introgressed alleles 
we also used evidence from mapped locations of mar- 
kers, phenotypic descriptions, and historical origins of 
lines and accessions. 

Results and discussion 

Markers and in silico mapping 

For the 47 markers used in this study (Additional file 1: 
Table SI), nucleotide primers for PCR and sequencing 
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Table 1 Tomato samples analyzed In this study 



Species 


Accession or line 


Description ^ 


Solonum hobrochoites 


PI 126445 


collected from Peru in 1937, source of Cyc-B, green-fruited 


Solanum pennellii 


PI 414773 


collected from Peru in 1976, source of/-/, 1-3, green-fruited 


Solanum peruvianum 


G 32592 (LA4125) 


naturally selfing, collected from Chile in 2001, green-fruited 


Solonum peruvianum 


LAI 537 


artificially inbred from PI 128650 collected from Chile in 1938 and source of Tm-2'^, green-fruited 


Solanum arcanum'^ 


G 32591 (LA2157) 


naturally selfing, collected from Peru in 1980, source of Cm QTLs, green-fruited 


Solanum pimpinellifolium 


PI 370093 


traces back to Vaughan Seed Co., Chicago, USA, circa 1930, source of Cf-2, Cf-3, Pto, red-fruited 


Solanum lycopersicum 


PI 303801 


Peru Wild (syn. with Utah 665) from Utah Agr. Expt. Sta., source of Ve, red-fruited 


Solanum lycopersicum^ 


PI 99782 


Tomate, collected in 1932 from Peru, red-fruited 


Solanum lycopersicum^ 


PI 109834 


Merville des Marches, collected in 1935 from France, red-fruited 


Solanum lycopersicum'^ 


PI 129026 


unnamed, collected in 1938 from Ecuador, red-fruited 


Solanum lycopersicum^ 


PI 129128 


unnamed, collected in 1938 from Panama, red-fruited 


Solanum lycopersicum^ 


PI 196297 


unnamed, collected in 1951 from Nicaragua, red-fruited 


Solanum lycopersicum'^ 


PI 258474 


unnamed, collected in 1959 from Ecuador, red-fruited 


Solanum lycopersicum'" 


PI 258478 


unnamed, collected in 1959 from Peru, red-fruited 


Solanum lycopersicum^ 


PI 390510 


unnamed, collected in 1974 from Ecuador, red-fruited 


Solanum lycopersicum'" 


TA496 


from S. Tanksley, Cornell Univ., developed in 1990s from E6203 x Vendor-7m-2^, red-fruited 



Germplasm sources of sequenced alleles in wild and cultivated tomato accessions. 

^ For complete references tracing history of these germplasm sources used for introgression breeding see [26]. 
^ Sequences published in [25]. 



were originally designed against S, lycopersicum sequen- 
ces except for the Conserved Ortholog Set II (COSII or 
C2) and unigene (U) markers, which were designed 
against Euasterids [27,28]. Molecular markers have 
shown good success rates of transferability among dis- 
tantly related wild tomato species such as S, lycopersicum 
and S, pennellii (for examples see [29-31]). In the current 
study, some primer pairs did not amplify or give clean or 
homologous reads in every wild tomato sample. The 
COSII and U markers did not outperform the other 
markers in terms of successfully generating high quality 
sequence data because most of them contained introns, 
which sometimes carried small indels in a heterozyogus 
condition. Such heterozygous indels were major contri- 
butors to poor quality reads and hence missing data. 

In our current germplasm panel (Table 1), 5'. lycopersi- 
cum and S, pimpinellifolium had no missing markers; 
sequences were available for at least two of the four 
green-fruited taxa for the final set of 47 markers. S, per- 
uvianum LAI 537, S, peruvianum G32592, S, arcanum, 
S. pennellii and S, habrochaites gave data for 32, 42, 40, 
35 and 38 markers, respectively. The latter two species 
are usually self-incompatible (SI) and carried greater 
numbers of polymorphic markers within accessions than 
the inbred accessions of S, peruvianum and S, arcanum 
that were sampled. Mean SNP frequency across the 
polymorphic markers was 0.0127 {n^Yl) for S, habro- 
chaites, 0.0082 {n = 17) for .S. pennellii, 0.0126 {n = 5) for 
S. peruvianum LAI 537, 0.0058 {n = 2) for S. peruvianum 
G32592 and 0.0078 (n = 7) for Peru Wild. TA496, 



Tomate and S. arcanum had no polymorphic sites in 
any of the markers. 

S. pimpinellifolium showed unusually high polymor- 
phism of 0.0055 {n = 17) relative to the other self- 
compatible taxa. As one explanation, this accession was 
categorized as an admixture population in a simple se- 
quence repeat (SSR) genotyping study of S. pimpinellifo- 
lium population structure [32], so naturally represents 
two dissimilar S. pimpinellifolium genomes. Because the 
seed source traces back to Vaughan Seed Co. in USA 
and Horticultural Experiment Station, Ontario, Canada 
[33], another possibility is that S. lycopersicum was 
incorporated into its pedigree by one of those entities. 
This may be evidenced by comparing the previously esti- 
mated A number of mutations per kb, [34] between the 
two species [35] which was approximately four-fold 
greater than our estimate reported below. 

The majority of markers were mapped with high con- 
fidence to a single map location within the genome 
(Figure 1, Additional file 1: Table SI). This is the first re- 
port in which a whole-genome sequence [36] and web 
based tools were available with which to do this for the 
expressed sequence tag (EST) -based markers [24]. Ac- 
cordingly, 12 of the EST-based markers were newly 
mapped and markers 437_2, 2189_1 and 2819_5 were 
revised with respect to previously predicted chromo- 
somal location based on identities to restriction frag- 
ment length polymorphism (RFLP) markers [25]. All 
chromosomes were represented by the 47 markers; 
numbers of markers per chromosome ranged from one 
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hp2 exon 2 
hp2 3' region 



2875_4b 
241 _2b 



1675_1 
Psy1 



2325_3 



2325_3 



3300_2 
2582_1 
4301 _3 
2189_1 



1589_1 

C2_At1g50020 
C2_At2g 15890 

U 146437 



C2_At1g73180 



Ve2 
Ve1 

C2 At2g36930 



220_1 , 2486_1 
'c'm9.'l"Frrfm2^ 
2534_1b ! 



437_2 I 



Ph-3 
Sw-5 



TG11 



CRTISO 
1724_1 



C2_At1g 13380 
rin 

C2_At1g 14000 



3155_3 
U221402 
271 9_1 
2582_1 
Pto 



C2_At2g22570 
Cf-3 
296_1b 
PTOX 



C2_At1g44575 



1863_3 
1260_2 

C2_At1g32130 

Cy_c-B___ 
281 9_5 i 



C2_At1g20050 
U146140 i 



♦ Solanum habrochaites 

^ Solanum lycopersicum Peru Wild 

♦ Solanum peruvianum 

♦ Solanum pimpinellifolium 

♦ Solanum pennellii 

Figure 1 Chromosomal map locations of 47 markers sequenced in this study. Nine markers with daslied outlines sliowed cryptic 
introgressions. Also shown on the map (color) are documented introgressions used in tomato breeding that were mentioned in this report, 5. 
hobrochoites: blue, 5, lycopersicum Peru Wild: red, 5. peruvianum: purple, S. pimpinellifolium: orange, 5. pennellii: green. 



on chromosome 12 to eight on chromosome 6. Small 
gaps were observed in the alignments but close examin- 
ation revealed that these were in masked regions rich in 
runs of poly A or poly T. Only marker 1675_1 did not 
initially provide any hits. A lowered stringency of 1 x 10' 

subsequently found the probable position. 

Four markers had two BLASTN hits each. These were 
1260_2 for two tightly linked (2,849 nt apart) sequences 
on chromosome 6, 2325_3 on chromosomes 1 and 3, 
2582_1 on chromosomes 4 and 5, and 2819_5 for two 
linked (14,114 nt apart) regions on chromosome 6. A 
check of the primer regions found multiple mismatches 
in predicted primer binding sites of the secondary hit for 
three of the markers; these were assumed to have ampli- 
fied as single-copy. For marker 2819_5 the forward pri- 
mer had a single mismatch and the reverse had no 
mismatches. The predicted amplicon in the mismatch 
region was 86% identical to the reference TA496 se- 
quence. When the mismatch sequence was included in 
MEGA cluster analysis, it separated from all other (wild 
and cultivated) alleles with 87% bootstrap support, and 
the mean D among sequences increased 8-fold {D = l,5 



for n = 10 alleles versus D=12 for n = ll alleles). There- 
fore, it is unlikely that this paralog amplified and con- 
founded the results. All markers had previously been 
screened in our lab for amplification of single bands and 
highly homozygous sequences within S. lycopersicum 
[24,25]. Results of in silico mapping confirmed that this 
set of markers provides robust results in sampling 
single-copy S. lycopersicum genes. All sequences were 
deposited into the European Molecular Biology Labora- 
tory (EMBL), European Nucleotide Archive (EN A) data 
base as accession numbers HE977919-HE978211. 

Clustering patterns and divergence estimates among taxa 

PI 99782 Tomate was chosen for the current study to 
represent a pre-introgression breeding' genotype. It 
bears small, slightly ribbed and unimproved fruit with 
scarring and cracking (http://www.ars-grin.gov/cgi-bin/ 
npgs/acc/display.pl? 1127604) and was homozygous for 
the common S, lycopersicum haplotype at 48 of 50 mar- 
kers for which it had been sequenced [25]. For 38 of the 
markers there was high bootstrap support for the red- 
fruited clade that consisted of S, lycopersicum including 
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Tomate and S. pimpinellifolium alleles (Additional file 1: 
Table S2). Bootstrap values ranged from 54% to 100% 
(mean ± standard error = 86.8 ± 234%). These bootstrap 
values subtracted from 100% can be interpreted as esti- 
mates of the probability of Type I error, i.e., falsely 
accepting a cluster that is not signified by the data (see 
[37] for discussion). 

Results underscored the close relationship of S, lyco- 
persicum to S, pimpinellifolium because their alleles 
were frequently identical or highly similar. The place- 
ment of green-fruited species with respect to the red- 
fruited clade and each other varied among loci. This can 
result from incomplete lineage sorting or introgression 
[38]. An example was the placement of S, habrochaites 
near S, pimpinellifolium for marker TGll (Additional 
file 2: Figure SI); this example was also reported by Nes- 
bitt and Tanksley [35]. 

Of the nine markers that did not support the red- 
fruited clade, seven had previously been hypothesized as 
carrying introgressions (Additional file 1: Table S2). 
These are discussed below. The other two markers, 
1287_1 and 2280_1, showed patterns that contained S, 
peruvianum within the red-fruited clade (Additional file 
2: Figure SI) seemingly due to lack of resolution. At 
marker 1287_1 S, peruvianum haplotypes were only one 
or two mutational steps from Tomate. These mutations 
were not shared with any other taxa. At marker 2280_1 
only four unique haplotypes were observed. These were 
5. habrochaites, S. pennellii, S, arcanum and {TA496, 
Tomate, S, pimpinellifolium^ Peru Wild, G32592, 
LAI 537}. For the 38 markers that supported the red- 
fruited clade, accepted taxonomic relationships among 
tomato species were generally supported [39,40] . 

Divergence estimates {D) among loci are associated with 
a high variance over evolutionary time due to differences 
in mutation and recombination rates, selective con- 
straints, and influences of various factors such as random 
sampling of gametes and demography. Average D ranged 
from 5 for marker 430 1_3 to 52 mutations per kb for mar- 
ker C2_Atlg44575 (mean ± standard error = 16.5 ± 1.29). 
At the aggregate scale the mean D ± standard error from 
PI 99782 Tomate was as follows: TA496 D = 0.001 ± 
0.0005, Peru Wild D = 0.002 ± 0.0006, .S. pimpinellifolium 
D = 0.002 ± 0.0004, S. arcanum D = 0.020 ± 0.0022, S. per- 
uvianum LA1537 £) = 0.020 ± 0.0025, S. peruvianum 
G32592 D = 0.022 ± 0.0024, S. habrochaites D = 0.021 ± 
0.0024 and S. pennellii D = 0.022 ± 0.033. 

The lack of precise resolution in distinguishing taxa by 
clustering, or by average divergence from Tomate in the 
case of the green-fruited taxa was a function of shared 
polymorphisms in many instances, e.g., at 24 markers at 
least one single nucleotide polymorphism (SNP) within a 
species was also segregating between other species pairs. 
This has been observed in other studies of crop species 



and their closely related wild relatives ([41] and refer- 
ences therein). In addition, random sampling contribu- 
ted to low resolution, e.g., S. peruvianum haplotypes 
appeared to be derived from Tomate at marker 1287_1 
based on a small number of noninformative SNPs. 

Evidence for introgression 

In previous studies we reported nine markers (Table 2) 
with highly diverged alleles within S. lycopersicum and 
hypothesized that this was due to introgression from 
wild species. These were 220_1, 437_2, 2325_3, 2534_1R 
(redesigned into 2534_lb), 2486_1 [24], 2819_5, 
C2_Atlg73180 [25], U146140 and C2_Atlg44575 [9]. Of 
the 47 markers in the current study, seven of these 
nine showed patterns in cladograms that did not clus- 
ter together members of the red-fruited clade (Add- 
itional file 2: Figure SI). Hybridization networks 
(Additional file 2: Figure SI), descriptive information 
of the accessions, map positions of markers, and species 
origins of documented introgressed disease resistance 
alleles (e.g. [16,26,42-44]) were used as total evidence to 
categorize the divergent markers into putative linkage 
drag during introgression breeding versus natural out 
crossing with S, pimpinellifolium (Table 2). This 
categorization did not constitute proof of natural 
hybridization versus introgression breeding. However, it 
was a useful concept from which to synthesize inde- 
pendent lines of evidence and can serve as a basis for 
future hypothesis testing of the two scenarios. 

Linkage drag was inferred for at least three of the four 
markers at which TA496 carried an allele that was highly 
divergent from all other members of the red-fruited 
clade. All four mapped to chromosome 9 and spanned 
from 5.77 MB to 54.71 MB (Table 2). Introgressed dis- 
ease resistance loci documented on chromosome 9 [42] 
include Vel (0.06 MB) and Ve2 (0.05 MB) both from 
Peru Wild, Frl (physical map position not annotated), 
(13.62 MB) and Sw-S (67.30 MB), all three from 
S, peruvianum, and Ph-3 (66.71-66.78 MB) from S. pim- 
pinellifolium (Figure 1, Table 2). LA1537 was the most 
closely related allele to TA496 in cladograms for the 
three markers spanning 5.77 - 17.00 MB on chromo- 
some 9, which encompasses the Tm-2^ locus at position 
13.62 MB. LAI 537 is an inbred accession that was 
derived from PI 128650, the original source of Tm-2^ 
(Table 1). 

Hybridization networks placed TA496 in various posi- 
tions for each of the three markers surrounding Tm-T: 
near the red-fruited alleles with reticulation back to- 
wards S, peruvianum for marker 2486_1 (Figure 2a and 
Additional file 2: Figure SI), between the red-fruited 
alleles and S, peruvianum with reticulation back towards 
both for marker 2534_lb (Figure 2b and Additional file 
2: Figure SI), and within the green-fruited wild species 
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Table 2 Tomato markers tested for cryptic introgression 



Marker 



SGN gene model, 
predicted protein 



Chromosome 
location (MB) 



Divergent line 
or accession 



Genetic-distance 
based clustering results^ 



Observations 



Introgression from crop improvement 

220_1 Solyc09g01428a ch09, 5.77 TA496 

liydroxycinnamoyi 
transferase 

2486_1 Solyc09g014350, ch09, 5.90 TA496 

glycerol-3-pliospliate 
acyltransferase 6 

2534_lb Solyc09g018790, ch09, 17.00 TA496 

succinic semialdeliyde 
reductase isofoml 

437_2 Solyc09g061440, ch09, 54.71 TA496 

uncliaracterized protein 



Introgression from natural hybridization with S. pimpinellifolium 



2325_3' Solyc01g073640, 

alcohol dehydrogenase-3 

C2_Atl g44575 Solyc06g060340, 

chloroplast photosystem 
ll-associated protein 



281 9_5^ 



U 1461 40 



C2_Atlg73U 



Solyc06g082670, 
ribosomal protein LIO 



Solyc06g083360, 
DNA-directed RNA 
polymerase II subunit 



chOl, 70.26 



ch06, 34.71 



ch06, 44.70 



ch06, 45.C 



Solyc08g014060, eukaryotic ch08, 3.57 
translation initiation factor 
3 subunit 9-like protein 



TA496 
PI 258478 

PI 258478 



PI 258478 

PI 109834 

PI 129026 
PI 129128 
PI 196297 
PI 258474 
PI 390510 



TA496 intermediate between 
LAI 537 and 5. pennelli 

low bootstrap values overall 
except {TA496, LAI 537} 



TA496 clustered with two 
5. peruvionum accessions 

TA496 intermediate between 
[S. peruvionum, S. orconum, Peru 
Wild, S. pimpinellifolium, Tomate} 
and {5. pennellii, S. Iiobroclioites} 



red-fruited clade was supported 



Major disease resistance 
genes on ch09 include: 

Ve2, 0.05 MB, (11.002 cM^), 
from Peru Wild 

Vel, 0.06 MB, (11.002 cM), 
from Peru Wild 

Cm9.1, (4.0 - 24.0 cM), 
from 5. peruvionum 

FrI, (27.0 - 37.0 cM), 
from 5. peruvionum 

Tm2'', 13.62 MB, (32.002 cM), 
from S. peruvionum 

Sw-5, 67.30 MB, (78.001 cM), 
from S. peruvionum 

Ph-3, 66.71 - 66.78 MB, 
(63.0 - 78.0 cM), from 
5. pimpinellifolium 



PI 258478 was collected 
from Peru in 1959, highly 
variable, fasciated fruit. 



red-fruited clade was supported Introgressions on ch06 include: 



red-fruited clade was split into 
{Peru Wild-1, Tomate, TA496} 
and {Peru Wild-2, S. pimpinellifolium, 
PI 258478} 

red-fruited clade was split into 
{PI 109834,5. pimpinellifolium} 
and {TA496, Peru Wild, Tomate} 

{PI 196297, PI 390510} were 
divergent from other members 
of red-fruited clade 



Cyc-B, 42.29 MB, (106 cM), 
from S. hobrochoites 



PI 109834 Merville des 
Marches was collected 
from France in 1935. 

PI 196297 was collected in 
Nicaragua in 1951, fasciated 
fruit, reported as introgressed 
by Rick [23]; carries same 
allele as PI 129026 (from 
Ecuador, 1938, fasciated fruit), 
PI 129128 (from Panama, 1938, 
fasciated fruit), PI 258474 (from 
Ecuador, 1959, fasciated fruit). 
PI 390510 was collected in 
Ecuador in 1974, described 
as a wild cherry tomato. 



Nine tomato markers previously identified as carrying highly divergent alleles within Solarium lycopersicum. 
^ (Additional file 2: Figure SI). 
^ Genetic linkage map positions from [45] or [46]. 
Sequence mapped to two locations, see Results and discussion. 



alleles between the two S. peruvianum accessions, with 
no reticulation for marker 220_1 (Figure 2c and Add- 
itional file 2: Figure SI). The size of the intro- 
gressed region was estimated by mapping in segregating 
F2 populations [19]. RFLP marker TGlOl at chromo- 
some 9 location 50.41 MB was tightly linked (< 1 cM) to 
The chromosomal position of has been 



characterized as very near the centromere with ex- 
tremely repressed recombination [47]. It is therefore 
probable that markers 220_1 (5.77 MB), 2486_1 
(5.90 MB) and 2534_lb (17.00 MB) were part of the 
introgressed segment in TA496. 

The TA496 allele at marker 437_2 (54.71 MB) was not 
as definitively related to S. peruvianum and its origin 
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S. habrochaites 



S. pennellii 



(a) 



marker 2486 1 




Peru Wild 
S. pimpinellifolium 



S. peruvianum LA1537 



5. peruvianum G 32592 



S. habrochaites 



(b) 



S. pennellii 



marker 2534_lb 




S. peruvianum LA1537 
S. peruvianum G 32592 



5. peruvianum G 32592 S. habrochaites 



S. peruvianum LA 15 3 7 



S. pennellii 



(c) 



marker 220_1 




Merville des Marches 



S. pimpinellifolium 



S. pimpinellifolium 



Peru Wild 



S. arcanum 
S. peruvianum G 32592 



S. pennellii 



(d) 



marker U146140 




S. habrochaites 



S. habrochaites 



S. pennellii 



S. peruvianum G 32592 




TA496, Tomate 
Peru Wild 



(e) 



marker U 146140 



S. pimpinellifolium 



Figure 2 Examples of splits networks at markers with cryptic introgressions in S. lycopersicum. Dashed boxes indicate accessions witli 
introgressed alleles (a-d: hybridization networks using marker U 146437 as a control, e: parsimony splits network); a) TA496 showed reticulation 
with 5. peruvionum LAI 537 and other green-fruited species, b) TA496 showed reticulation with red-fruited and green-fruited species, c) TA496 
nested within green-fruited species, d) Merville des Marches PI 109834 was closely related to S. pimpinellifolium, which showed reticulation to 
Tomate PI 99782, e) parsimony splits network used only conservative, global signal within the marker to illustrate the hybrid origin of Merville des 
Marches PI 109834. 
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was more difficult to interpret. The cladogram placed 
TA496 in an intermediate position between two clades, 
namely, {S. pimpinellifolium, Tomate, S. arcanum, Peru 
Wild, S. peruvianum} versus {S, habrochaites, S, pennel- 
lii] (Additional file 2: Figure SI). The hybridization net- 
work showed reticulation with S. habrochaites and the 
red-fruited alleles (Additional file 2: Figure SI). Several 
genes commonly found in tomato varieties have origi- 
nated from S, habrochaites {Cf-4, Tm-1, 01- 1, Cyc-B and 
Del) or S, pennellii {I-l and 1-3) but none of these map 
to chromosome 9 [43,45]. The chromosome 9 introgres- 
sions listed in Table 2 were judged to be commonly used 
in cultivars based on an informal survey of descriptions 
from online seed catalogs (unpublished observations) as 
well as comprehensive reviews of tomato breeding (e.g. 
[16,26,42-44]). 

Linkage drag of 437_2 in TA496 with Ph-3 (66.71 MB) 
or Sw-S (67.30) was rejected because they originated 
from S, pimpinellifolium and S, peruvianum, respect- 
ively. Based on a BLASTN search of TA496 marker 
437_2 against the Solanaceae PlantGDB-assembled 
Unique Transcipts (PUTs) in the Solanaceae Genomics 
Resource database (http://solanaceae.plantbiology.msu. 
edu/), TA496 and S. habrochaites shared a sympleisio- 
morphy at nucleotide position 46 that was absent from 
all other alleles that we resequenced. This provided ten- 
tative but inconclusive evidence of a direct relationship 
between the introgression and S. habrochaites. As alter- 
native evidence, FM6203, a progenitor of TA496, pur- 
portedly carries Asc resistance (pers comm. S. Loewen, 
University of Guelph, 2005) for which S. pennellii has 
served as one of the original sources on chromosome 3 
in tomato [48]. 

The remaining five markers with divergent alleles 
showed patterns consistent with introgression from S. 
pimpinellifolium in natural populations. At four markers 
(2325_3, C2_Atlg44575, 2819_5 and U146140) the di- 
vergent alleles were more closely related to red-fruited 
rather than green-fruited taxa with bootstrap values ran- 
ging from 82% - 100% (Additional file 1: Table S2), al- 
though markers 2819_5 and U146140 did not support 
monophyly of red-fruited alleles (Additional file 2: Figure 
SI). Hybridization networks showed red-fruited species 
alleles to be distinct from green-fruited species alleles 
for 2325_3 and C2_Atlg44575, while 2819_5 and 
U146140 showed connections between red-fruited and 
green-fruited species (Additional file 2: Figure SI). Mar- 
ker U146140 nicely illustrated the S. ly coper sicum x S, 
pimpinellifolium hybrid origin of Merville des Marches 
PI 109834 (Figures 2d, 2e). Marker C2_Atlg73180 
showed an unusual pattern in that PI 196297 (and three 
additional accessions with the identical allele. Table 2) 
and PI 390510 were divergent from all other alleles in 
the cluster analysis with 99% bootstrap support 



(Figure 3a and Additional file 2: Figure SI). This sug- 
gested a potential paralog. However, no heterozygotes 
were observed, the marker did not map to more than 
one genomic location, and COSII markers were designed 
to amplif)^ highly conserved single copy genes [27]. 

The divergent C2_Atlg73180 alleles were unlikely to 
have originated from a green-fruited species because 
chromosome 8 does not carry any introgressed disease 
resistance alleles [16,42-44]. The hybridization network 
depicted PI 196297 and PI 390510 as branching from S. 
pimpinellifolium, with complex reticulations at the base 
of the red-fruited clade that extended through the 
green-fruited taxa, down to S, pennellii at the root 
(Figure 3b). Among the seven unique polymorphisms 
carried by PI 196297 and PI 390510, two were non- 
conservative amino acid substitutions, two were syn- 
onymous and three were intronic. One possibility is that 
selection at or near this locus has caused ancient poly- 
morphism to have been retained. A significant HKA test 
[49] (j^ = 7.36, P = 0.007) strengthened this interpret- 
ation. Therefore, diversity at this marker showed pat- 
terns of both natural selection and introgression. 

Additional evidence that these five markers represent 
S, pimpinellifolium introgressions in natural populations 
includes geographical origins of three accessions from 
Ecuador where the two species hybridize extensively 
[50], original collection of three accessions dating to 
1935 and 1938 (likely precluding the influence of direct 
introgression breeding), the primitive fruit phenotype of 
fasciation (four accessions) or wild cherry tomato (one 
accession, n.b., cherry tomato was described as an ad- 
mixture of S, lycopersicum and S, pimpinellifolium by 
[35]), and previously reported introgression of PI 196297 
with S, pimpinellifolium by Rick [23]. In estimates of 
population structure [51] PI 109834, PI 129128, PI 
258474 and PI 258478 all showed high probability of 
membership in the second of two populations inferred 
for S, lycopersicum genotypes, consistent with interspeci- 
fic hybridization [9]. 

Conclusions 

It was useful to delineate cryptic introgression within S, 
lycopersicum into linkage drag stemming from breeding 
versus natural hybridization with S, pimpinellifolium, al- 
though this categorization was not definitive and should 
be subjected to further scrutiny. Sequences of the wild 
tomato species markers in the context of their physical 
map locations have strengthened our previous interpre- 
tations of detection of introgressed alleles in domesti- 
cated tomato [24,25]. Genomic tools for fine resolution 
of introgressed regions in crop species are increasingly 
available. The strengths and weaknesses of comparative 
genotyping to verify introgression have been illustrated 
here using Solanum section Lycopersicon taxa. 
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C2_At1g73180 



S. pimpinellifolium 
Peru Wild 
TA496 
S. arcanum 
Tomate 

S. peruvianum LAI 537 
S. peruvianum G 32592 



S. habrochaites 



S. pennellii-^ 
S. pennellii-2 



(a) 

C2_At1 973180 



PI 196297 1 
5. pimpinellifolium 



S. peruvianum LA1 537 
S. peruvianum G 32592 



5. pennellii 



(b) 




S. habrochaites 



root 



Figure 3 Extreme divergence of marlcer C2_At1g73180. a) Clustering at marker C2_Atlg73180 showed the distinctiveness of PI 129026 and 
PI 196297 with 99% bootstrap support, b) the splits network showed a combination of introgression of PI 129026 and PI 196297 (among others, 
see Table 2) with 5. pimpinellifolium and retention of ancient polymorphisms that reticulated down to the root; the latter supported genetic 
hitchhiking and rejection of selective neutrality. 



In the current study the implications of Unkage drag 
of introgressed alleles on phenotype remain unknown. 
At least one marker (2534_lb) codes for an enzyme 
involved in fruit ripening (Table 2). In cultivars or 
breeding lines it will be more useful to estimate the pro- 
portion of a genome that harbors linkage drag. This 
should be feasible for TA496 using bioinformatics given 
the vast amount of public EST sequence data available 
for this particular line and wild tomato species (http:// 
solgenomics.net/tools/blast/dbinfo.pl as of June 2012 
reported 323,465 Lycopersicon mRNAs). Importantly, 
an understanding of linkage drag will help to distinguish 
it from selection during crop improvement. Four of six 
markers that previously rejected neutrality tests (437_2, 
2486_1, 2534_lb and 2819_5) (Table 2 in [25]) were 
found to be introgressed rather than selected. It is 
anticipated that next-generation sequencing will be 



utilized to more rapidly eliminate linkage drag in crops 
[52]. 

Natural hybrids will carry high proportions of wild 
alleles making them somewhat easy to detect using large 
numbers of molecular markers. The intrinsic value of 
naturally introgressed germplasm was recognized in 
common bean {Phaseolus vulgaris) as a source of new 
alleles for traits such as disease resistance [53]. In S. 
lycopersicum, if horticultural effects of introgression are 
subtle then accessions such as Merville des Marches 
may be prime sources to screen for new alleles. A search 
of the literature for accessions that were part of the 
current study (Table 2) found that PI 129128 showed 
high lycopene content, similar to lines containing pig- 
ment mutations such as o^, hp, and dg [54]. 

Finally, it is worth noting that Rick [23] reported the 
potential of natural hybridization of S, lycopersicum with 
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Solanum chilense, S. habrochaites (formerly Lycopersicon 
hirsutum f. glahratum), and Lycopersicon peruvianum 
(now revised into four species, [55]) in regions of Chile, 
Ecuador and Peru where sympatric populations grow in 
close contact, although he found no evidence of this 
based on phenotypes and severe postzygotic barriers are 
well known. Genotyping of populations sampled from 
these regions would provide evidence to reexamine 
whether introgression from these wild tomato species 
into S, lycopersicum has played a role in the crops 
evolutionary history. 

Methods 

Plant material 

Marker genotypes used in this study were previously 
reported for Solanum arcanum and S, lycopersicum 
[9,25] or were newly collected from each of five wild 
tomato accessions and one weedy S, lycopersicum ac- 
cession Peru Wild. The specific accessions of S, habro- 
chaites, S. pennellii, S. pimpinellifolium and Peru Wild 
have served historically as sources of important disease 
resistance alleles for tomato cultivars (Table 1). The 
two S. peruvianum accessions were chosen because 
they were known to be inbred and predicted to be 
highly homozygous. One of these, S. peruvianum 
LA1537, originated from accession PI 128650 which 
was the original source of Tm-2^ [56,57]. Two plants 
per each of the six accessions were sampled as seed- 
lings for genomic DNA isolation and sequencing of 
markers. Three accessions included in this study had 
previously published sequences for the markers [25] 
(Table 1). These were - a breeding line with documen- 
ted multiple introgressions in its pedigree including 
Tm-2^ (TA496) [57], an accession that predated tomato 
introgression breeding (Tomate, PI 99782), and S, arca- 
num (G 32591) a naturally self- fertilizing accession 
formerly classified as Lycopersicon peruvianum. For 
a few markers, published sequences from additional 
S, lycopersicum accessions (PI 109834 Merville des 
Marches, PI 129026, PI 129128, PI 196297, PI 258474, 
PI 258478, PI 390510, Table 1) were included in ana- 
lyses because they were previously reported as carrying 
highly divergent alleles, i.e. putative introgressions, at 
those loci [9,25]. 

DNA sequences 

Genomic DNA extraction from seedlings, PGR amplifi- 
cation and two-pass sequencing were as described in 
Labate et al. [28]. Initially, 49 of 50 markers from Labate 
et al. [9] were sequenced. These represent random loci 
including expressed genes (expressed sequence tag, EST- 
based), highly conserved genes (COSII and U) and arbi- 
trary loci. Marker 1523_4 was excluded without testing 
because it tended to give poor quality sequence within 



S. lycopersicum. Markers 175_1 and 1909_2 were 
dropped during this study because they did not consist- 
ently amplify S, lycopersicum homologs in wild tomato 
species, leaving 47 markers representing approximately 
24 kb in total (Additional file 1: Table SI). Software 
packages phred, phrap and Consed [58,59] and Staden 
[60] were used for assembly and base calling of reads. 
Pregap4 (ver. 1.5) of the Staden package was configured 
to apply a base-calling algorithm "Estimate Base Accur- 
acies" that is different from phred in order to independ- 
ently verif)^ the data. Sequence data were trimmed to 
remove primer binding sites and low quality ends 
(phred<40), and manually aligned in BioEdit [61]. All 
SNPs and heterozygous positions were confirmed by vis- 
ual examination of trace files by two people. If the two 
plants from one accession had different sequences they 
were kept distinct; if they were identical they were trea- 
ted as a single representative sequence of that accession. 
Heterozygous sites were manually edited to use lUPAC 
nucleotide ambiguity codes. GeneSeqer (ver. 08 Oct. 
2008) [62] was used to compare exon and intron predic- 
tion for all markers against previous annotations [28] . 

All sequences were mapped using BLASTN [63] 
against tomato whole genome shotgun chromosomes 
(SL2.40) database with an e-value threshold of 1 x 10'^^ 
on the SGN web site [64]. Gene models within markers 
and adjacent regions were identified in the SGN genome 
browser using ITAG2.3 Release: genomic annotations. 
For marker 437_2, tomato sequences were compared to 
transcribed sequences from an evolutionary out group 
(potato, Solanum tuberosum) by BLASTN searches of the 
Solanaceae Genomics Resource database at Michigan 
State University (http://solanaceae.plantbiology.msu.edu/). 

Statistical analyses 

For each of the 47 markers, relationships among geno- 
types (also referred to as taxa) were first examined by 
applying the neighbor-joining (NJ) clustering method 
[65] as implemented in MEGA 4.0.2 [66] with 1,000 
bootstrap replicates. Genetic distance and average evolu- 
tionary divergence (D) were estimated using the Jukes - 
Cantor method [34]; positions with alignment gaps or 
missing data were eliminated in pairwise sequence com- 
parisons. For each of 11 markers that did not support 
the red-fruited clade based on MEGA results, consensus 
trees were generated by Phylip ver. 3.69 using Seqboot 
to produce 100 datasets by bootstrap resampling, Dna- 
dist to estimate genetic distances using the Jukes -Cantor 
method. Neighbor to produce unrooted NJ trees and 
Consense to compute a consensus tree by the majority- 
rule consensus tree method [67]. SplitsTree4 ver. 4.12.3 
[68] was used to create splits networks from DNA 
sequences or NJ trees [69]. Hybridization splits networks 
were created using the consensus tree of marker 
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U146437 as a control that highly supported the red- 
fruited clade (99%) plus the consensus tree of the marker 
being tested for an introgression, with balanced sets of 
taxa (same taxa in each tree). A neutrality test [49] of 
marker C2_Atlg73180 was carried out in DnaSP v. 5.10 
[70] using marker TGll as a control and S. arcanum as 
the out group. 

Additional files 
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